2021-03-19

The Melbourne Housing Snapshot Dataset

  • Home Sales in 2017
    • Location
    • Construction
    • Sale
  • Variables: 21
    • Numeric: 12
    • Categorical: 9

Summary of Select Numeric Variables

Price Rooms BuildingArea
Min 85000 1.00 0
Q1 650000 2.00 93
Median 903000 3.00 126
Mean 1075684 2.94 152
Q3 1330000 3.00 174
Max 9000000 10.00 44515
NA NA NA NA’s :6450

Select Data Pairs

Corrleations

Map of Melbourne Sales

Selling Price

Log Selling Price

Price by Region

Price by Number of Rooms

Price by Type of Home

Price by Region and Type

if we keep this, consider making the legend colors standardized so region same no matter the chart

First Attempt at Linear Model

Remove the Variable with Highest VIF

Model Coefficients

Consider Interactions

Remove Land Size

Model 4 Coefficients

Resdidual Analysis - Homogeneity? No

Resdidual Analysis - Normal? Not Quite

Resdidual Analysis - Influence? Yes

Remove Influence Points

Testing \(R^2\)

\[ \begin{equation} R^2 = 1- \dfrac{RSS}{TSS} \end{equation}=0.441\]

Transform Data - Homogeneity

Transform Data - Normality

Future Work

  • Further explore log transformation
  • Consider GLM
  • What to do about factors with many levels (100’s)?
  • Missing data
  • Improve Prediction